282 research outputs found

    Measuring Sociality in Driving Interaction

    Full text link
    Interacting with other human road users is one of the most challenging tasks for autonomous vehicles. For congruent driving behaviors, it is essential to recognize and comprehend sociality, encompassing both implicit social norms and individualized social preferences of human drivers. To understand and quantify the complex sociality in driving interactions, we propose a Virtual-Game-based Interaction Model (VGIM) that is parameterized by a social preference measurement, Interaction Preference Value (IPV). The IPV is designed to capture the driver's relative inclination towards individual rewards over group rewards. A method for identifying IPV from observed driving trajectory is also developed, with which we assessed human drivers' IPV using driving data recorded in a typical interactive driving scenario, the unprotected left turn. Our findings reveal that (1) human drivers exhibit particular social preference patterns while undertaking specific tasks, such as turning left or proceeding straight; (2) competitive actions could be strategically conducted by human drivers in order to coordinate with others. Finally, we discuss the potential of learning sociality-aware navigation from human demonstrations by incorporating a rule-based humanlike IPV expressing strategy into VGIM and optimization-based motion planners. Simulation experiments demonstrate that (1) IPV identification improves the motion prediction performance in interactive driving scenarios and (2) the dynamic IPV expressing strategy extracted from human driving data makes it possible to reproduce humanlike coordination patterns in the driving interaction

    Chinese Text Recognition with A Pre-Trained CLIP-Like Model Through Image-IDS Aligning

    Full text link
    Scene text recognition has been studied for decades due to its broad applications. However, despite Chinese characters possessing different characteristics from Latin characters, such as complex inner structures and large categories, few methods have been proposed for Chinese Text Recognition (CTR). Particularly, the characteristic of large categories poses challenges in dealing with zero-shot and few-shot Chinese characters. In this paper, inspired by the way humans recognize Chinese texts, we propose a two-stage framework for CTR. Firstly, we pre-train a CLIP-like model through aligning printed character images and Ideographic Description Sequences (IDS). This pre-training stage simulates humans recognizing Chinese characters and obtains the canonical representation of each character. Subsequently, the learned representations are employed to supervise the CTR model, such that traditional single-character recognition can be improved to text-line recognition through image-IDS matching. To evaluate the effectiveness of the proposed method, we conduct extensive experiments on both Chinese character recognition (CCR) and CTR. The experimental results demonstrate that the proposed method performs best in CCR and outperforms previous methods in most scenarios of the CTR benchmark. It is worth noting that the proposed method can recognize zero-shot Chinese characters in text images without fine-tuning, whereas previous methods require fine-tuning when new classes appear. The code is available at https://github.com/FudanVI/FudanOCR/tree/main/image-ids-CTR.Comment: ICCV 202

    Orientation-Independent Chinese Text Recognition in Scene Images

    Full text link
    Scene text recognition (STR) has attracted much attention due to its broad applications. The previous works pay more attention to dealing with the recognition of Latin text images with complex backgrounds by introducing language models or other auxiliary networks. Different from Latin texts, many vertical Chinese texts exist in natural scenes, which brings difficulties to current state-of-the-art STR methods. In this paper, we take the first attempt to extract orientation-independent visual features by disentangling content and orientation information of text images, thus recognizing both horizontal and vertical texts robustly in natural scenes. Specifically, we introduce a Character Image Reconstruction Network (CIRN) to recover corresponding printed character images with disentangled content and orientation information. We conduct experiments on a scene dataset for benchmarking Chinese text recognition, and the results demonstrate that the proposed method can indeed improve performance through disentangling content and orientation information. To further validate the effectiveness of our method, we additionally collect a Vertical Chinese Text Recognition (VCTR) dataset. The experimental results show that the proposed method achieves 45.63% improvement on VCTR when introducing CIRN to the baseline model.Comment: IJCAI 202

    An Analytical Model for Predicting the Stress Distributions within Single-Lap Adhesively Bonded Beams

    Get PDF
    An analytical model for predicting the stress distributions within single-lap adhesively bonded beams under tension is presented in this paper. By combining the governing equations of each adherend with the joint kinematics, the overall system of governing equations can be obtained. Both the adherends and the adhesive are assumed to be under plane strain condition. With suitable boundary conditions, the stress distribution of the adhesive in the longitudinal direction is determined

    On the Opportunities and Challenges of Offline Reinforcement Learning for Recommender Systems

    Full text link
    Reinforcement learning serves as a potent tool for modeling dynamic user interests within recommender systems, garnering increasing research attention of late. However, a significant drawback persists: its poor data efficiency, stemming from its interactive nature. The training of reinforcement learning-based recommender systems demands expensive online interactions to amass adequate trajectories, essential for agents to learn user preferences. This inefficiency renders reinforcement learning-based recommender systems a formidable undertaking, necessitating the exploration of potential solutions. Recent strides in offline reinforcement learning present a new perspective. Offline reinforcement learning empowers agents to glean insights from offline datasets and deploy learned policies in online settings. Given that recommender systems possess extensive offline datasets, the framework of offline reinforcement learning aligns seamlessly. Despite being a burgeoning field, works centered on recommender systems utilizing offline reinforcement learning remain limited. This survey aims to introduce and delve into offline reinforcement learning within recommender systems, offering an inclusive review of existing literature in this domain. Furthermore, we strive to underscore prevalent challenges, opportunities, and future pathways, poised to propel research in this evolving field.Comment: under revie

    Human Health Indicator Prediction from Gait Video

    Full text link
    Body Mass Index (BMI), age, height and weight are important indicators of human health conditions, which can provide useful information for plenty of practical purposes, such as health care, monitoring and re-identification. Most existing methods of health indicator prediction mainly use front-view body or face images. These inputs are hard to be obtained in daily life and often lead to the lack of robustness for the models, considering their strict requirements on view and pose. In this paper, we propose to employ gait videos to predict health indicators, which are more prevalent in surveillance and home monitoring scenarios. However, the study of health indicator prediction from gait videos using deep learning was hindered due to the small amount of open-sourced data. To address this issue, we analyse the similarity and relationship between pose estimation and health indicator prediction tasks, and then propose a paradigm enabling deep learning for small health indicator datasets by pre-training on the pose estimation task. Furthermore, to better suit the health indicator prediction task, we bring forward Global-Local Aware aNd Centrosymmetric Encoder (GLANCE) module. It first extracts local and global features by progressive convolutions and then fuses multi-level features by a centrosymmetric double-path hourglass structure in two different ways. Experiments demonstrate that the proposed paradigm achieves state-of-the-art results for predicting health indicators on MoVi, and that the GLANCE module is also beneficial for pose estimation on 3DPW
    • …
    corecore